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Abstract 

We consider the Metropolis algorithm for the distribution n{x) — 0^'^^^ (1 + (?)^" on the 
hypercube X = {0, 1}", where S{x) is the number of ones in a; G {0, 1}" and 6 e (0, 1] is a 
constant. For n = (2) this distribution corresponds to the Erdos-Renyi random graph model 
on V vertices, where each edge is present independently with probability The lazy random 
walk Metropolis algorithm for this model specifies a Markov chain {Xt) on X that is known to 
have cutoff at -j^nlogn with window size n, a result derived by Fourier analyis in Diaconis and 
Hanlon (1992) and Ross and Xu (1994). In this work we give a new proof of this result that 
^ , is purely probabilistic. This is done in the hope that probabilistic techniques will be easier to 

\ generalize to other, less symmetric distributions tt. Our proof uses coupling and a projection 

' to a two-dimensional Markov chain Xt — >■ (S'(Xt), (i(Xo, Xt)), where d{XQ,-) is the Hamming 

■ distance to the starting state Xq. 

1 Introduction 

^ \ We are interested in analyzing convergence rates of the random walk Metropolis Hastings algorithm 

5^ \ for various distributions vr on the hypercube X := {0, 1}"". This Markov chain on X moves as 

follows: Given that we are at a state x € X, we chose one of the n neighbors of x uniformly at 
random, say y, and propose to move from x to y. Here x is called a neighbor of y, denoted by x ~ y, 
whenever x and y differ in exactly one coordinate. This proposal gets accepted, i.e. we move to y, 
with probability min(l, ^||j). If it gets rejected we stay at x. This transition rule ensures that we 
have detailed balance 

tt{x) p{x,y) = 7r{y) p{y,x) for all x,y £ X, 
and since the chain is irreducible, its unique stationary distribution is tt. 
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Often it is convenient to make the chain lazy. This corresponds to flipping a fair coin inde- 
pendently at each step. If it comes up heads, we stay where wc are; if it comes up tails, we move 
according to the rule specified above. If P is the transition probability matrix of the original chain, 
then the lazy chain has transition probability matrix P' := where / is the identity matrix. 

All the stationary distributions it we will investigate are going to be strictly unimodal. By this 
we mean that there exists a state, z say, that has higher mass under tt than any other state, and 
the TT-mass decreases strictly with distance to z, i.e. 

7r(a;) < 7r(y) whenever d{x,z) > d{y,z). 

Here, d{x, z) := Yl^=i I 

tjCi Z^i I IS the graph distance of x and z, i.e the number of coordinates where 
X and z differ. We also require tt to be radially symmetric (with respect to the mode z), i.e. 

Tr{x) = Tr{y) whenever d{x,z) = d{y,z). 

By relabeling the states we may (and will) assume that the mode 2; of tt is at = (0, 0, 0). 
This ensures that tt is constant on level sets L{k) := {x e X : S{x) = k} where k G {0,1, ...,n} 
and S{x) := Y17=i '^^ number of ones in x. Hence, in the Metropolis algorithm we will always 
accept downward moves x — )■ y where S{x) > S{y), and we will accept upward moves x ^ y where 
S{x) < S{y) with probability Os(x), where we write Ok '■= where v,w are some (any) states 
such that S{v) = k + 1 and S^w) = k. 

For concreteness, the transition kernel for the lazy random walk Metropolis Hastings algorithm 



IS 



p{x,y) 



2n 



i + ^(l-^5(.)) 



x^y,S{x) >S{y), 
x^y,S{x) <S{y), 
x = y, 
otherwise. 



Note that a consequence of radial symmetry is that the projection S{Xt) of the Markov chain 
(Xf) is also Markov, since for all x,y we then have 

P{x, [y]) = P{x' , [y]) for all x' ~5 x. 

Here we write x y whenever S{x) = S{y) and [y] := {z : S{z) = S{y)} for y e X are the 
equivalence classes of the relation ~5. 

We measure distance to stationarity by total variation: 

\\P\x,-) -tt\\tv ■= max {P\x,A)-tt{A)), 
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and we are interested in this distance from the worst starting point: 

d{t) := max | \P*{x, ■) — tt] \tv- 
The mixing time for a parameter e G (0, 1) is defined as 

tmixis) ■= min{i > : d{t) < e} 

and wc write tmix for tmix{^/^)- We are interested in the behavior of tmi^ as n goes to infinity. 

An interesting phenomenon is that for some chains the mixing time tmixi^) doesn't depend on 
the parameter e (asymptotically, as n goes to infinity). We say a sequence (X'^"'))„gi^ of Markov 
chains X^*^) = {xi''^)t=o,i,... on {0, 1}" has a cutoff (at tj^^^^) if, for ah e > 0, 

t^"^ (e) 

Km , ""^^^ ^ = 1. (2) 
41(1 - e) 

Here, 4L(^) denotes the e-mixing time of the n*'* chain (^(^"^)t=o,i,...- This is equivalent to 

iis..„(ce)-{j p) 

So the function dn{-), the total variation distance to stationarity from the worst starting point 
for the n*'* chain, approaches a step function as n goes to infinity (if we rescale time by t^^^). For 
a proof of this equivalence see Levin et al. (2009, Lemma 18.1, page 247), from where we also 
borrowed the notation. For an overview of this cutoff phenomenon, see Diaconis (1996). 

The following result is well known and not hard to prove: 

Proposition 1.1. For a sequence of finite Markov chains, the following are equivalent: 
(i) The sequence has a cutoff at t^j^. 

(a) For all e G (0, 1) we have lim„_).oo = 1- 

(Hi) For alle G (0,1) we have t^^J^{e) = 4ia;" [l + ^(™' ^)] for some h{n, e) with]hD.n^ooh{n,e) = 0. 
(iv) For all e,e' G (0, 1) we have t^^lie) ^ t^^lie'). 

An) X _ / 1 if C<1, 



(v) \min^^dn{ct 



mtx' 



if ol. 



(vi) lim„_^oo dn{c some £ G (0, 1). 

if c>\, 
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(vii) liuin^ oo dn{ct^^l^ (e)) = < ' for die £ (0,1). 

[0 if c> 1, 

Sometimes we can analyze more precisely what happens for c = 1 in ([3]). We say a sequence of 
Markov chains has a cutoff with window size (wn), if Wn G and 



lim liminf dn{vlji^ - aWn) = 1, (4) 
lim limsup + aWn) = 0. (5) 



For an introduction to Markov chains and mixing time see the highly recommended book by 
Levin, Peres and Wilmer (Levin et al., 2009), from which we borrow heavily. 



2 The Erdos-Renyi random graph model 

The easiest model in the class of rotationally symmetric and strictly unimodal distributions vr on 
the hypercube arises when we have 9k = £ (0, 1] for all k, i.e. the acceptance probabilities for 
upward moves are constant across level sets. That is, tt{x) = 6''^(^)(1 + 6*)-". (The case where 9 > 1 
would correspond to the mode of vr being at 1 = (1, 1, 1) instead of = (0, 0). By symmetry, 
this gives rise to nothing new, so we will assume 9 £ (0, 1) henceforth. The case where 9 = 1 
corresponds to vr being the uniform distribution.) 

If we have n = (2) and identify the list of coordinates of the hypercube with the list of potential 
edges of a graph on u vertices, then the hypercube represents the space of all possible (undirected) 
graphs on u vertices. (A one indicates that a certain edge is present in the graph; a zero indicates 

/ \S{x) / \n-S{x) 

that it is absent.) Since 7r(x) = ( ) ~ T+e) ^'^^ ^ ^ ^^^^ distribution vr 

corresponds to the Erdos-Renyi random graph model with parameter This is the probability 
distribution on (undirected) graphs on u vertices where each of the n = (2) potential edges is present 
independently with probability p := j^. 

The non-lazy version of the random walk Metropolis chain for this model has cutoff at 2(J+e)^ ^ 
with window size n. This was derived using Fourier analysis in Diaconis and Hanlon (1992, Theo- 
rem 2, page 104) together with Ross and Xu (1994, Theorem 4.2, page 824). Diaconis and Hanlon 
studied the projection S{Xt) and explicitely calculated the eigenvalues and eigenvectors of the tran- 
sition kernel. This establishes the lower bound. Ross and Xu showed that the Metropolis chain 
for this model on the hypercube can be viewed as a random walk on a hypergroup deformation of 
the hypercube. Fourier analysis of this random walk then leads to the result for the upper bound. 
Another proof using representation theory and Iwahori-Hecke algebras is given by Diaconis and 
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Ram (2000, Theorem 5.4, page 177). See also Diaconis and Saloff-Coste (2006, page 2117) for a 
discussion of these and some related results. 

For the lazy version of this chain we therefore expect cutoff at j^n log n with window size n. 
In this paper will give an alternative proof of this result. This is done in the hope that a more 
probabilistic proof will be easier to generalize to less symmetric models, where Fourier analysis 
might be harder to apply. To be specific, we will proof the following result: 

Theorem 2.1. The lazy random walk Metropolis chain for tt{x) = 9^^^\l + 0)^" on {0,1}" has 
cutoff at j^nlogn with a window of size n. 

Corollary 2.2. Let n := (2) and let 7r(x) = p^^^^{\ — p'^'^~^i^) be the Erdos-Renyi random graph 
model on v vertices with parameter p G (0, 1). The lazy random walk Metropolis chain for this model 
has cutoff at max{p, 1 — p}n\ogn with a window of size n. 

Proof of Corollary: As mentioned above, by relabeling states (switching zeros and ones) we 
may assume p £ (0, 1/2]. Let 9 := Since 

/ n \ S{x) / a \ n-S(x) 

and max{p, 1— p} = l— p = j^, the result follows from the Theorem. □ 

3 Lower bound 

Our proof for the lower bound part of Theorem 12.11 mimics the one given in Levin et al. (2009, 
Proposition 7.13 on page 95) for the case where the stationary distribution tt is uniform (9 = 1). 
It is based on the method of distinguishing statistics, described for example in Levin et al. (2009). 
Their Proposition 7.8 on page 92 is this: 

Proposition 3.1. Let fi and v he two probability distributions on X, and let S he a real-valued 
function on X . Lf 

\E^{S)-E,{S)\>ra, 
where = [Varf,{S) + Var^{S)]/2, then 

4 

4 + 

Here E^{S) := J2xex S{x)fj,{x) denotes the expectation of S under /i, and likewise for u. So if 
we can find a real function on the state space X such that its expectations under P*(x, •) and vr are 
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still very different (on the scale of their average variance) after t steps of the chain, then we have 
demonstrated that •) — 7r|| must still be large. 

A natural choice for the distinguishing statistic is the number of ones in a state, S{x) := XlILi -^i 
for X G {0, 1}". Therefore we have to analyze the one-dimensional projection S{Xt) =: Vt of our 
Markov chain (Xt). As mentioned in the introduction, this is again a Markov chain whose transition 
probabilities satisfy P{k,l) = P{x,-)S~^{1) = P{x,[y]) for any x,y with S{x) = k and S{y) = I. 
As before, [y] = {z e {0,1}" : S{z) = S{y)} denotes the equivalence class of all states with the 
same number of ones as y. (Because of their different domains it should not lead to confusion that 
we are using the same notation P(-, •) for the transition probabilities of the original chain and the 
projected chain.) 

Similarly, the stationary distribution vr^ of (Vt) is the push-forward ns '■= TrS~^ of vr under S. 
This entails 7rs{k) = Qef^il + 6)-^ = (I) (^-^)^ (l - ^J'^ = Binomial (n, ^) {k). 
Therefore we get the expectation and variance of F tt^ as 

Tl Tl 

^-^ = TT^' ^«^-^=(TT^- 

The chain (Vt) is a birth and death chain on {0, 1, ...,n} with transition probabilities 

k\ 
' 2' 



Pk := P{Vt+i = k + l\Vt = k)= { 1 



n := P{Vt+,=k\Vt = k) = \+(^l-^^^, (7) 

Qk := P{Vt+i = k - l\Vt = k) = ^. 

We begin by calculating its expectation after t steps starting from k. Note that for all t we get 

f 1 with probability (l - ^) f , 
1 —1 with probability 



Vt+i - 14 = <j 

and Vt+i — Vt = otherwise. So 



n J 2 2n 2 ' 2n ' 
and therefore 

m-^iWt] = 2 - + E[vtm = 2 + - 

By taking expectation Efc with respect to the starting state k we get for all t, k that 

E,(\4+i) = ^+(l-^)Efe(F,). (8) 
This leads to the following result: 
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Proposition 3.2. The projected chain Vj := S{Xt) of our Metropolis chain (Xf) has for all k 
0, 1, n and all t £ N 



nb 



1 + 



(9) 



where 7 := 1 — 



2n 



Proof: By induction on t. Fix any starting state k G {0, 1, Tlie statement is true for 

t = since -EfcC^o) = k. Now suppose it is true for t. Tlien by ([8]) and the induction liypothesis we 
get that 



EkiVt 



+ 7 



^Uk 



2 ^ V 1 + 



TTl 

n0 



+ 



+ 



nO 

n6 



1 + 9 2 



7 



t+i 



k 



nt 



1 + 



n6 



1 + 



so it is also true for t + 1. □ 
Remark 3.3: Note that for u := Un,e ■= i^^i-logre we get 

-1/2 



7 ~ n 



by which we mean that lim„ 



1. 



Proof: Using 1 — x < e~^ for x G M we get 



lim 

n n 



r 



-1/2 



lim-v/n ( 1 



< lim -y/n exp 



2n 
1 + 6* 1 

2n 1 + 1 

1 



-nlogn 



lim -y/n exp < log n 

n I 2 

n 
1. 



For the other direction we expand the natural logarithm about one to get 

k 



- log 1 



1 + ^ 

2n 



~ ^ k \ 2n 

k=l ^ 

2n 2 V 2n 



2 . 00 



1 V- 



A;=3 



2n 



1 



2n 



+ 0(l/n2 
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This means that for some positive constant C we get 



Um 

n n" 



r 



-1/2 



lim 

n 



> 



> 



n exp 
hm \/n exp 



1 



1 + 



nlog(n) log ( 1 — 



1 + ^ 
2n 



1 



lim ^Jn Gxp 

n 



1 + 
1 



nlog(n) 



i + e 

2n 



+ 0(l/n^ 



log(ra) > exp 



1 + 1 



C 



n 



log n 



lim-v/n 



n 



-1/2 



= 1. □ 

It remains to calculate the variance of Vf. Since we want a lower bound on 

dit) = sup,\\P\x,-)-n\\ > ||P*(1,-) -vr||, 

it's enough to consider the starting state 1 = (1, 1) of all ones. For this, first note that we can 
run the chain Xt in the following way. Given we are at state Xt at time t: 

• Pick a coordinate i G [n] uniformly at random. 

• Draw Ut ~ Uniform[0, 1], independent of everything else. 



• Set X 



U) 
t+i 



for j 7^ i, and set the i^^ coordinate of Xf+i to 



th 



X, 



t+1 



1 

X 




< f/j < |, 

f < < i 

\<Ut<l. 



Now say that a coordinate j has been refreshed by time t, if coordinate j was selected at some 
time s < t and Us ^ (fj^]- L^t Rt be the number of coordinates not refreshed by time t. We 
can study the expectation and variance of Rt with a natural modification of the classical coupon 
collector problem. This leads to the following result, analogous to Lemma 7.12 on page 94 in Levin 
et al. (2009): 

Proposition 3.4. Consider the coupon collector problem with n distinct coupon types, where at each 
trial, with probability 1^ we get no coupon, and with probability 1 — we get a coupon chosen 
(independently and) uniformly at random. Let Ij{t) be the indicator of the event that the coupon 
has not been collected by time t. Let Rt := ^ji^) number of coupon types not collected 

by time t. The random variables Lj(t) are negatively correlated, and letting 7* := (l — , we get 
fort>0 that 

E{Rt) = n-i\ 
Var{Rt) < n7* (1 - 7*) < nj\ 
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Proof: By definition of Ij (t) we get 

1 — j = 

and 

Var[Ijit)] = E[{Ij{t)f] - {E[Ijit)]f = f - 7^* = 7* (1 - 7*)- 
Similarly, for j ^ k we get 

E[Ij{t) Ik{t)] = P{No coupon of type j or k in trials l,.--,t} 

1 + 6* 2^ * 



7* 



1 



2 n 



n 
so 

Cov[Ij{t),h{t)] = E[Ij{t)h{t)] - E[Ij{t)]EMt)] = (l - - (l - < 0. 

Therefore 

n 

E[Rt] = Y,E[lM=n7' 

i=i 

and 

n 

Var[Rt] = y(^r[Ij{t)] + ^ Cov[Ij{t),Ik{t)] < nf (1 - 7*)- □ 

If wc start the chain at Xq = 1, then the conditional distribution of Vj = S{Xt) given Rt = r 
is the same as that of r + i?, where B ~ Binomial(n — r, j^)- Therefore, 

E,m I R,] = R, + in-R,)^^ = 
so by taking expectation we get 

confirming our result for general starting states A; from above for the special case k = n. Further- 
more, since 

Vari[Vt] = Var [E^{Vt \ Rt)] + E [Van{Vt \ Rt)] , 
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we get 



Vari[Vt] = Var 



Rt + nO 

i + e 



+ E 



1 



< 
< 



1 

n 



Var\Rt] + 



Var (Binomial (n — Rt, _^ q )) 
{n-E[Rt])e 



[727* + (n - 727*) 6*] 



+ 

To apply Proposition 13.11 observe that 
2 Varpt(^i.-jV + Var^^V 



< inayi{VariVt,VarT^gV} < 



n 



So for t := tn,a '■= jrl^'T'logTi — oin and 7* = (l — ^g^)* we get for any fixed e > and large n that 

?i(6' + 7*) nO 



i+e' 
\EiVt-E^V\ 



TT 

n 



1 + 

t 



1 + 
> aJn I 1 



1 



1 + 



1 + e 

2n 

1 + ^/\"(TTs^°e''-°) 



2n 



> (T-v/n(l — e) exp |- 



1 + 



1 



1 + 



log n — a 



(t(1 — e) exp < a 



1 + 1 



By Proposition I3.H this means d{t) > ||P*(1, •) - 7r|| > 1 - 4-^. Therefore 



lim liminf d{tn,a) ^ Ihn 1 



a— >oo >oo 



0^00 4 + 

This finishes the proof of the lower bound part of Theorem 12.11 □ 



1. 



4 Lower bound, alternative proof 

The previous proof for the lower bound part of Theorem 12.11 might be hard to generalize to cases 
where 9^ is not constant in k. (Recall that Ok ■= for any x, y such that S{x) = k + 1, S{y) = k.) 
Therefore, we give here an alternative proof for a slightly weaker result. 
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Proposition 4.1. Let (Xf) be the lazy random walk Metropolis chain for 7r{x) = 6^^^\l + 6) " on 
{0,1}". Then 

lim dJC^—nlogn) = 1 for all C < 1. (10) 

n^-oo 1 + O* 

Note that (jlOp establishes the lower bound part of the result that this chain has cutoff at 
j^n log n. It does not give the window size though. 

For this proof we use a modified version of the method of distinguishing statistics (see Propo- 
sition 13. ip that avoids the need to calculate or approximate the variance of the statistic S under 
P*(x, •). Instead, we use the fact that our chain only makes local moves to argue that S{Xt) = Vt 
must be concentrated about its mean under P^{x, ■). To do this, we use Azuma's inequality: 

Theorem 4.2 (Azuma's Inequality). .■ Let (^4)4=0,1,... be a martingale with bounded differences, i.e. 

o-i — Yi-i < bi for some Oj, 6j G R and all i G N. 

Then for any t G N and s > we get 

P{Yt -Yo>s} < and 
P{Yt-Yo<-s} < 6-2^'/^ 

where c := YA=iibi - ^if ■ 

For a proof see for example Dubhashi &: Panconesi (2009, Theorem 5.2, page 67) or Ross (1996, 
Theorem 6.3.3, page 307). Note that both inequalities are strict unless Yt is constant a.s., since this 
is true for Markov's inequality (applied to e^*), on which the proof is based. 

Now we can't apply this result directly in our situation, since Vi = S{Xi) is not a martingale. 
However, by the tower property of conditional expectation, Yi := E[Vt \ Vq-a] for i = 0, 1, t gives a 
martingale with respect to Vi. Here we write Vk-.i := {Vi)k<i<i- This immediately gives the following 
result, sometimes called the method of averaged bounded differences (e.g. in Dubhashi &: Panconesi, 
2009, Theorem 5.3 on page 68). 

Proposition 4.3. Let Vq = v £M and t G N 6e fixed and let Vi, Vt be a sequence of real-valued 
random variables with averaged bounded differences, i.e. 

\E[Vt\Vo..^]-E[Vt\Vo.,^i_^)]\<c, 

for some Cj G M and all i = 1, ...,t. Then for any s > we get 

Pv{Vt > E„Vt + s} < and 
Pv{Vt < E,Vt - s} < e~2^'^ 

where c := 4^*^-^ c^. 
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Proof: Fix t G N and consider 1^ := E[Vt \ V^-i] for i = 0,1, ...,t. This gives a martingale with 
respect to (V^). Furthermore, Yq = E[Vt \ Vq] = EyVt and Yt = Vt- By assumption, the martingale 
{Yij has bounded differences 

— Cj <Yi — Yi-i < a for all z G [t]. 

Applying Azuma's inequality to (Yi) finishes the proof. □ 

Now we apply this result to our Markov chain Vt = S{Xt) to show that it is concentrated about 
its mean. This will follow from Proposition 14.31 where we use the result from Proposition 13.21 to 
show that [Vt) has averaged bounded differences. 

Proposition 4.4. Let {Xt) be the lazy random walk Metropolis chain on {0, 1}" for 7r(x) = ^'^'■^^(H- 
9)^^ . Let Vt := S{Xt) and also Xq = x, so v := Vq = S{x). Then for any s > we get 



Pv{Vt<E,V-s} < e-2^'/(9t)_ 



and 



Proof: Let 7 := 1 — and fix any t € N. Since (V^) is a Markov chain, we get from 
Proposition 13.21 and the Markov property that for any i = 0,1, ...,t 



E[Vt I Vo:i] = Ev, [Vt^^] = 7*"' 

Therefore for any i = 0, 1, we get 



V 



+ 



n9 



E[Vt\Vo:i]-E[Vt\Vo.,^i_,^] = r 



t-i 



V 



iTt 



7 



t-(i-i) 



Vi 



i-l 



nB 
iTt 



.t-i 



7 



7' 



t-i 



V - V:-! + (1 - i)y^-l - (1 - 7) 



1 I /I a 

Vi - Vi-i + ^—Vi-i - - 
2n 2 



3 , 3 



and so 

|i^[^t|V^0:d-i?[14|T/0:(.-l)]l <7 2^2' 

since we have Vi G [0,n] and \Vi — Vi-i\ < 1, and also 9 G (0,1]. Applying Proposition 14.31 with 
Ci := ^ now gives the result. □ 

Now we use the concentration result for (V^) from Proposition 14.41 to establish a lower bound 
on the mixing time for this Markov chain. 
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Proof of Proposition \4.1\ - Fix 6 S (0, ^) and set t := tn,s '■= iqrjTq:!"' log n. To establish (fTOl) 
from Proposition 14. H we need to show 

hm d{tn,5) = 1. (11) 

n—¥oo 

(Note that since d{t) is decreasing in t, it is enough to prove (|lUp for large c < 1, so our 
restriction 5 < ^ is not problematic.) To this end, denote with Vt := S{Xt) the projection of the 
chain under S, started at Xq = 1, so that Vq = n and note that for any r G [0, n] we get 

d{t) > ||P*(l,.)-7r|| 

= supAc{o,i}"P\h A) - vr(A) 

> SUpLc[n]P\LS~'{L))-7TiS-\L)) 
= SUpL^[n]P\n, L) - TTs{L) 

> Pn{Vt>r}-P{Bn>r} 

= l-Pn{Vt<r}-P{Bn>r}. 

Here the second inequality comes from noting that the right supremum is over a subset of events 
from the left supremum. (Therefore, total variation distance can only decrease after projecting 
down.) Furthermore, Bn stands for a random variable with distribution tts = Binomial(n, and 
we used the event L := { [r] , [r] + 1, n} for the last inequality. Our task is to find r G [0, n] such 
that 



lim Pn{Vt < r} = and (12) 
lim P{Bn >r} = 0. (13) 



n— )-oo 

S 



To this end let p := and c := Cn^s := 1 — n so c G (0,1), and define a convex 

combination of the means of the two random variables Vt and Bn by 

n9 nO ,1 
r := rn,s ■= cEnVt + (1 - c) — — = — — - + cj n- 



1+6 l+e ' 1+6 
Here we used 7 := 1 — 1^ and Proposition 13.21 for the last equality. Define 

n6 1 

si:=EnVt-r = (1 - c)[S„V^ - — — ] = n'V^: 



1 + 6' ' 1 + 

n6 ^ , . 1 

^2:=r--— = c[EnVt-—^] = {l-n-P)^'n-, 



1 + 6 ' " " 1 + 6' ^ "1 + 6 

To establish (jl2p we use the concentration result for Vt from Proposition 14.41 This shows 

Pn{Vt <r} = Pn{Vt < EnVt - Si} < exp{-^}. 
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Now fix any e S (0, 1) and we get for large n that 



> 



t (1 + 0)2 

V 2n ) 

iifnlogn 
^i-2p(i_g)exp{-l±^^^logn} 
i±f bgn 

(l + 5)(l-£) n^-^^-ii^ ^^^^ 



1 + 6 log n 

Here the inequahty comes from (1 — — )• as n goes to infinity for any a; G M . Now the 
right hand side of this expression goes to infinity as n goes to infinity because the left ratio is a 
positive constant and the right ratio goes to infinity as n goes to infinity since 1 — 2p — > 0. 
This proves (fT2|) . 



To establish (jl3p . we use Chebyshev's inequality for the Binomial{n, ji^) random variable Bn- 
We get for any fixed e G (0, 1) and large n that 

P{Bn>r} = P{Bn--P^>S2} 



< 



VarBr, 



S2 



2 

ne 



g2^2t^2^_^~)2 



< 



c2n(l-iM)"iTT«TTll°gH 



c2n(l - e) exp{-i±^^^ logn} 



The last inequality again comes from (1 — ^)"' — t- as n goes to infinity for any x G M. Now 
the right hand side of this expression goes to zero as n goes to infinity, since c = Cn^s goes to one as 
n goes to infinity, and 1 — = > 0. This proves (fTSll and therefore (fTTIl . finishing the proof 
of Theorem 12.11 lower bound (1101) . □ 



Remark 4.5: Note that our concentration result for Vt from Azuma's inequality is not strong 
enough to establish the window size n of the cutoff. To see this, note that the smallest r that 
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ensures that P{Bn > r} goes to zero as a goes to infinity would be of the form r = + a-^/ra, 
since EBn = and VarBn = 0{n). However, in that case for tn,a '■= jj^nlogn — an, Azuma's 
inequahty only gives a trivial upper bound of one for linia^oo hm„_>oo Pn{yt„^a < ''^l- 



5 Upper bound 

To establish the upper bound part of Theorem 12 .11 we use a two-stage coupling procedure similar to 
the one used in Levin et al. (2009, Theorem 18.3, page 251) for the case where tt is uniform {0 = 1). 
The first step there is to show that upper bounding d{t) = sup^ ||P*(a;, •) — vrH can be reduced to 
the the problem of upper bounding ds{t) = sup,^. \ \P^{k, •) — vr5||, i.e. to show that it is enough to 
analyze the mixing time of the one-dimensional projection S{Xt) £ {0, 1, ...,n} of Xt G {0, 1}*^. 

This is possible for 9 = 1 since by symmetry, the total variation distance to stationarity is the 
same for all starting states: 

-vr|| = -vr|| for all x,y G {0,1}". 

Therefore it is enough to consider the starting state -^^0=1= (1) !> •••) 1) of all ones. But for this 
starting state the transition probabilities are constant on level sets L(k) := {z € {0, 1}" : S{z) = k}, 
where A: = 0, 1, ...,n. By induction on t, we get this also for the t— step transition probabilities: 

P\L y) = P\L z) whenever S{y) = S{z). 

Since tt is also constant on level sets, this entails 

n 

ll^*(i,-)-vr|| = 2E E \P\hz)-<z)\ 

1=0 z:S{z)=l 



-E 

2 ^ 

1=0 

}-j2\P\n,l)-7rs(l) 



P\Lz)-7r{z) 

z:S(z)=l 



2 

1=0 



= ||P'(n,-)-vr5||, 

where we can move the absolute values outside the (inner) sum in the second equality because all the 
terms in the sum are equal. So total variation distance stays the same under the one-dimensional 
projection S if we start from Xq = 1 (or 0). 

When 9 G (0,1), the t— step transition probabilities are still constant on level sets if we start 
the chain at Xq = 1 (or 0). However, total variation distance is not necessarily the same for each 
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starting state in this case. Also, we have been unable to show that Xq = 1 (or 0) is a worst starting 
state, i.e. we couldn't prove 

d{t) = sup||P*(j;,-)-7r|| = max{ ||P*(1, •) - 7r|| , ||P*(0, •) - 7r|| }, 

X 

so we don't know whether bounding the mixing time for 6 G (0, 1) can be reduced to a one- 
dimensional problem in the same way as when tt is uniform. 

However, we have been able to get an upper bound on the mixing time using a iiuo-dimensional 
projection that depends on the starting state Xq = x. For this, consider Z := Zr^ : {0, 1}" 
{0,1, X {0,1, 

Xt ^ Z^{Xt) := Zt := {S{Xt),d{x,Xt)) , 

where d is graph distance on the hypercube (i.e. d{x, y) is the number of coordinates where x and 
y disagree) and x = Xq\s the starting state of the chain. In words, we project down to the number 
of ones, S{Xt), and the distance to the starting state, d{Xt,XQ). This two-dimensional chain is 
very similar to the two-coordinate chain used in Levin, Luczak and Peres (2010, section 3) for the 
related model of Glauber dynamics for the mean-field Ising model. 

Our proof proceeds by showing that {Zt) is a (two-dimensional) birth and death chain with 
the same total variation distance to its stationary distribution as the original chain Xf. Bounding 
this distance is then achieved by a two-stage coupling procedure. In the first stage (dh/it-regime) 
we use an independence coupling of two versions of the chain that brings them close together after 
Yq:gnlogn steps due to the drift towards the mean in this birth and death chain. In the second 
stage {entropy-Tcgvaie) we use a coupling with a certain Ornstein-type coupling of two lazy simple 
random walks on to ensure that the two versions of the chain coalesce after an additional an 
steps. 

Wc begin by establishing some properties of the two-dimensional projection Zt := Zx{Xt) of 
Xj, for which we need some more notation. Fix x, z G {0, 1}" and let S{x) = k,S{z) = I, d{x, z) = I' . 
Define 



G 


:= G{x, 


z) 


= {^^ 


[n] 


: = 0, z^* 


= 0}, 


N 


:= N{x 


z) 


:= {zG 


[n] 


: x« = 0,z(* 


) = 1} 


E 


:= E{x, 


z) 


:= {^G 


[n] 


: .tW = 1,z(^ 


= 0}, 


F 


■■= F{x, 




= {ie 


[n] 


: x» = l,z'^' 


= !}• 



When clear from the context, we might suppress the dependence of G, N, E, F onx,z in the notation. 

For the number of elements in these four sets we get 

l + l' + k 
#G = n , 



16 



*N = i±^-^, (15) 



2 

- k) 



2 

This follows from #N + #E = I', #N + #F = l,#E + #F = k,#G + #N = n- k, e.g. by starting 
with the observation 

l'-l + #F = l'-#N = #E = k-#F, 

which gives the last equality 2^F = I — {I' — k) above. The remaining equalities then follow. This 
is probably best understood by looking at an example. Suppose 

X = 00000000111, 
z = 00000111001. 

Then n = ll,S{x) = k = 3, 5(z) = I = 4,d(x,z) = 5 and #G = 5, #iV = 3,#^ = 2,#F = 1. 
Thus, we have a onc-to-one correspondence between (n, fc, V) and (^^G, #iV, ij^E^ i^F). Note that 
the formulas for #G,i^N,#E,^F above all give integers because /,/' — k,l' + k always have the 
same parity. 

Proposition 5.1. For each x G {0,1}" the t—step transition probabilities P*(x,-) are constant on 
sets 

L{1) n D{x, I') , for I, I' G {0, 1, n}, 
where L{1) := {z G {0, 1}" : S{z) = 1} and D{x,l') := {z G {0, 1}" : d{x,z) = I'}. 

Proof: Fix x G {0, 1}" and let k := S{x). We use induction on t. For t = 1, the only nontrivial 
cases to check are I' = 1 and I — k = ±1. For any y,z e L{1) fl -D(x, V) we get 

P(x,y) = —e = P{x,z) if/ = fc + l and 
2n 

P{x,y) = ^ = P{x,z) ifl = k-l, 

so the statement is true for t = 1. Now suppose it's true for t and fix 1,1' G {0, 1, ...,n} as well as 
y,z e L{1) n D{x,l'). Recall the definition of G{x, z),G{x,y), N{x, z), etc. for these x,y,z from 
above and define 

G{x, z) = {ve {0, 1}" : 3z G G(x, z) such that = + 1, v^^ = z^^^ Vj 7^ z}, 
iV'(x, z) = {ve {0, 1}" : 3i G iV(x, z) such that v^^ = - 1, v^^^ = z^^ Vj ^ i}, 
E{x, z) = {ve {0, 1}" : 3z G z) such that v^^ = z^^ + 1, v^^^ = z^^^ Vj ^ i}, 
F{x, z) = {ve {0, 1}" : 3i G F{x, z) such that = - 1, ^0') = z^^^ Vj / i}. 
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Similarly for y. For example, (j(x,z) is the set of f 's that agree with z everywhere except for one 
coordinate in z), i.e. a coordinate where both x and z are zero. By assumption, y and z arc in 
the same set L(l){~\D{x,l'). So clearly, #G{x,y) = ^G{x,y) = ^G{x,z) = z). Furthermore, 

we get 

G(x, y),G{x, z) c L{1 + 1) n D{x, I' + 1). 

This means that if s G G{x, y), s' G G{x, z), then s and s' arc in the same two-dimensional level set 
L[l + 1) n D[x, I' + 1). This implies P{s, y) = P(s', z) by the transition rule for the Markov chain, 
and it further implies P*{x,s) = P^{x,s') by the induction hypothesis. Similarly for N,E,F. 
Therefore we get 

P^+^{x,z) = P\x,z)P{z,z) + J2 P\x,s')P{s',z) + P\x,s')P{s',z) + 

s'eG{x,z) s'eN(x,z) 

+ P\x,s')P{s',z)+ ^ P'ix,s')Pis',z) 

s'eE{x,z) s'eF{x,z) 

= P\x,y)P{y,y) + |^ y) + Yl P\x, s)P{s,y) + 

seG{x,y) seN{x,y) 

+ J2 P\x,s)P{s,y)+ J2 P\x,s)P{s,y) 

seE{x,y) seF{x,y) 

= P'^\x,y). 

So the statement is also true for t + 1 and we are done. □ 

Corollary 5.2. The projection Zf := Zx{Xt) := (S{Xt),d{x,Xt)) is Markov and we get 

||P*(x,-)-7r|| = \\V{kZt) - TTzJl 

where x = Xq and k = S{x). Here, T>{kZt) denotes the distribution of the chain (Zf) at time t when 
started at Zq = (k, 0) and ttz^ '■= t^Z~^ is the stationary distribution of this chain. 

Proof: Fix Xq = x with S{x) = k and consider the equivalence relation corresponding to 
the classes L(l) n D{x, I'), for I, I' e {0, 1, n}. That is, for y,z E {0, l}" we have 

y z S{y) = S{z) and d{x,y) = d{x,z). 

Then the projection (Zt) of (Xt) is a Markov chain if 

P{y, L{h) n D{x, h')) = P{z, L{h) n D{x, h')) (16) 

for all /i, h' G {0, 1, n} and all y,z & {0, 1}" such that y z. But this follows from symmetry 
(exchangeability of the coordinates) in our Markov chain (Xt). To see this, fix any y,z G {0,1}** 
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such that y z and let / := S{y) = S{z) and /' := d{x,y) = d{x,z). Fix any h,h' G {0,1,..., n}. 

Note that S{Xt) and d{x,Xt) can only stay the same or change by ±1 in one step of the Markov 

chain. So in order to show (jl6p . there are only a few cases to distinguish: 

/ = h : Equation reads "P(y, y) = P{z, z)" for /' = h' and "0 = 0" for /' / h'. 

I = h — 1 : For I' = h' — 1 equation (jl6p reads 

P{ pick i G y) and flip 0^1} = y)— 6* = #G(x, z)— (9 = P{ pick i G z) and flip ^ 1}. 

2n 2n 

Similarly for 1' = h' + 1. For 1' ^h' ±1 we get "0 = 0" . 

I = h + 1 : This is entirely analogous to the case I = h — 1. 

In all other cases equation (|16p reads "0 = 0". This proves (jl6p . showing that the process {Zt) is in 
fact a Markov chain. 

Total variation distance to stationarity remains unchanged under the projection because 
both P*(x,-) and vr are constant on sets L{1) n D{x,l') for 1,1' G {0,1,..., n} by Proposition 15.11 
That allows us to pull out the absolute values from the inner sum in the second equation below, 
because all the terms in the sum are equal: 



2 

I I' zeL{i)nD{x,i') 



2 

I I 



P\x,z)-^{z) 

zeL{i)r\D{x,i') 



2 

I I' 

= \\P\x,-)Z-'-7rZ-'\\. 

Clearly, P*(a;, •)Z~^ = V{kZt) and the fact that vr^^ := vrZ"-'^ is stationary for [Zt) is an elementary 
calculation summarized in the Lemma below. □ 

Lemma 5.3. Let {Xt) he a Markov chain on X and suppose S : X ^ y is onto and such that 

P{x,S-\y)) = P{x\S-\y)) 

for all y and all x,x' £ X such that S{x) = S{x'), so that (S{Xt)) is a Markov chain on y. If 
TT is stationary for (Xt), then ttS^^ is stationary for {S{Xt)). 

Proof: Fix y £ y and for every z £ y pick some Vz G S^^{z). Then by stationarity of vr for 
(Xt) we get 

Y,^S-\z)P{z,y) = Y.7rS-Hz)Pivz,S-\y)) 
zey z&y 
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^ 7r{x)P{v,,S~Hy)) 
^ 7r{x)P{x,S-\y)) 

Y j;^(x)P(:r,T;) 

?;e5-i(i/) 
vr5-i(y). □ 



Remark 5.4: Proposition [5TT1 the Corollary 15 . 21 and their proofs stay true when we allow 9 = Os(^Xt) 
to depend on S{Xt), as in the random walk Metropolis chain for an arbitrary rotationally symmetric 
(i.e. 7r{x) = 7r{y) whenever S{x) = S{y)) distribution vr on the hypercube, as discussed in the 
introduction. Also note that unimodality of vr is not needed for this result. 

To establish the upper bound part of cutoff in Theorem 12.11 by the Corollary it remains to 
show that for t := tn,a '■= log ^ + we get 



lim limsup sup \\V{s(^^-^Zt) - ttzJI = 0. 

a^oo n^oQ a;G{0,l}" 



(17) 



For any fixed Xq = x with S{x) = k the projected chain (Zt) = {Zx{Xt)) has the following 
transition kernel: 



pi{l,l'),ih,h')) 



2n-{l'+l+k) n 
4n " 


: h = 


l + l,h' = 1 


' + 1 


I'+l-k 
An 


: h = 


l-l,h' = 1 


' - 1 


k+l'-ln 
An " 


: h = 


l + l,h' = 1 


' - 1 


k-{l'-l) 
An 


: h = 


l-l,h' = 1 


' + 1 




: h = 


I, h' = l 


1 



(18) 



otherwise. 



This follows from (jlSp together with the transition rule of the original chain {Xt). For example, 



piil, I'), a + 1,1' + 1)) = P{ pick i e G and flip 0^1} = #G—9. 

2n 
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For k < ^ the state space of this chain is 

{(/,/') e {0,1,..., n}2 : G fc + {-/,-/ + 2,...,/} : for / < A:, 

I' ek + {l-2k,l-2k + 2,...J} : for k < I < n - k, 

I' £ n — k + {I — n,l — n + 2, n — /} : for n — k < I}. 

A similar result holds for k > ^. In both cases, after reparametrizing 

(/,/') ^ {I' -1,1' + I) =: (r,r'), 

the state space becomes {(r, r') : r £ {—k,—k + 2, ...,k},r' £ {k,k + 2, ...,2n — k}} . The boundaries 
— k<r<k and k < r' < 2n — k here can also be confirmed like this: By definition we have 

r = l'-l = N + E-{N + F) = E-F and 
r' = I' + l = N + E+{N + F) = 2N + k. 

Since F > 0, we get r = E — F<E<E + F = k. Also, since E > Q and F < k, we get 
r = E — F > —F > —k. Similarly, since N < n — k, we get r' = 2N + k < 2{n — k) + k = 2n — k. 
And since > 0, we get r' = 2N + k> k. 

The transition kernel in this new parametrization becomes 

s = r, s' = r' + 2, 

s = r, s' = r' — 2, 

s = r — 2, s' = r' , ^ 

(19) 

s = r + 2, s' = r', 
s = r, s' = r\ 
otherwise. 

So the chain [Zt) can be viewed as a birth and death chain on a rectangle ml?. A particular 
feature of this chain is that the probability of moving up (down) in the r— dimension only depends 
on the current location in that dimension: it only depends on r, not on r' . Similarly, the probability 
of moving up (down) in the r'— dimension only depends on r', not on r. The problem of coupling 
two versions of this chain can therefore be split up into coupling two one-dimensional chains. Note 
also that this feature will be lost if we allow 6 = Ogi^Xt) to depend on the number of ones in the 
current state A^. 

We begin by calculating the expected location of the chain {Zt) after t steps when started 
at Zq = (r,r') . Similar to the one-dimensional projection S{Xt) that we analyzed for the lower 
bound, this can be calculated explicitly by induction on t since the transition probabilities (|19p are 
all linear in the current location (r, r') of the chain. 



p{{r, r'), {s,s')) 



2n-{r'+k) , 
in ' 



4n 
k+r a 

4n ^ 
k—r 

4n 

1 . 2n—{r'—r) 

2 ^ in 



(1 
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For this, fix Xq = x £ {0,1}" and let k = S{x),Zt = Z^{Xt) = {S{Xt), d{x, Xt)), so that 
Zq = (k, 0). Denote with Zf = {Z^ \ Z^ ^) the coordinates of the chain in the new parametrization 
(fT9|) . so that {Zq\Zq ^) = {—k,k), and write Ek for the expectation operator given this starting 
state. Then for this parametrization. 



Zt+i — Zt = < 



(0,2) 

(0,-2) 

(-2,0) 

(2,0) 
I (0,0) 



with probability 

with probability 

with probability 

with probability 
otherwise . 



2n-(fc+Zt''''^) 



4n ' 



k+Z, 



(r) 



4n 



(r) 



4n ' 



Therefore, 



E[Zt+i — Zt I Zt] 



_^k + Z, 



(r) 



k-ZP ^2n-{k + Z, _2o_2f^t 



■ e + 2 



,2. 



zP-k 



An 4n 4n An 

'k{l -e)- Z[''\l + 9) 2ne + k{l -9)- Z['''\l + 9)\ 



2n 



2n 



so that 



E[Zt+i I Zt] 



k{l -9) + [2n - (1 + 0)]zf'^ 2n^ + k{l -9) + [2n - (1 + 9))]Z, 



2n 



2n 



By taking expectation, we get 



Ek[Zt 



k{l -9) + [2n - (1 + 9)]EkZi''^ 2n9 + k{l -9) + [2n - (1 + 9)]EkZ^ 
2n 

(3 + ^EkZi''\9 + f3 + jEkZp 



t+i\ 



2n 



(20) 



By induction on t, this leads to a proof of the following result: 



Proposition 5.5. Let Zt = Zx{Xt) = {S{Xt),d{x,Xt)) be the two-dimensional projection of the 
lazy random walk Metropolis chain (Xt) for tt{x) = 9^^^\l + ^)~", started at Xq = x £ {0, 1}" with 
S{x) = k. Then in the parametrization \19^) and for any t gN we get 



1 + 



1 + 9 



where (3 := /3n,k,e := 4(1 - ^) and 7 := -fn,e := 1 - ^ 



2n ■ 
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Proof: The claim is true for t = 1, since by (j20p we get 



E,{zr\zr 



Now suppose the claim is true for t. Then by (pOj) we get that E/. I Z^^\, Z\j^[ j is equal to 



/3 + 7 



2n^ 



(1-7*)- ^7* 



+ /3 + 7 



2n(g + /3) 
1 + 



(1 - 7*) + A:7* 



2nR 
/3 + ^7 



1 + 



/ 2n/3 
llT^ 



2n 



2n/3 
TT 



+ 7 



1 + 0^ 



A;7' 



1 + 
,+1 2n(g + /3) 
' 1 + 



1 + ^ 



2n 



1 + 



+ 7 



2n(0 + /3) 



1 + 



f +^ + A:7 



So it is also true for t + 1, finishing the proof. □ 



Corollary 5.6. For the expectation under stationarity in the parametrization [7Pj) of our two- 
dimensional chain (Zt) we get 



eJz(^\z^''^ 



2n/3 2n(0 + /3) 



1 + 



1 + 



1-0 2n0 ,1-0 



Proof: Since the Markov chain (Zt) is irreducible and aperiodic, it converges to its unique 
stationary distribution as t goes to infinity for fixed n, k. Since the state space is finite, this 
convergence also holds in Li. So by the Proposition, we get 



hmE,{zt\zP 



2n/3 2n(0 + (3) 



,1 + 1 + 

since for fixed n, k we have 7* — )• as t goes to infinity. □ 

Remark 5.7: By reversing the linear transformation (/, /') 1— t- (/' — /,/' + /), we immediately get 

^ ^ / nO ne + k{l 

Vi + ^ 1 + ^ 

for the expectation under stationarity in the original parametrization (jlSp of our two-dimensional 
chain [Zt). For the first coordinate this confirms what we already know from S(X^ ~ Binomial(n, -^^^ 
under stationarity. 
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Remark 5.8: By exactly the same proof we get for a general starting state {r,r') in the new 
parametrization (jl9p that 

E,,., (zr'. zr) = - y) + . 5^(1 - v) + r'Y) . 

Now from Remark 13.31 on page [7] we know that for u := Unfi '■= j^nlogn we get 7" ~ 
n~^/^, meaning that the ratio of these two quantities goes to one as n goes to infinity. Therefore, 
Proposition 15. 51 and its Corollary imply that for any starting state the expected location of the chain 
(Zt) after u steps is within 0{y/n) of the expected location of the chain under stationarity. To see 
this, just subtract the stationary expectation from the expectation after u steps and note that both 

+ k and ^"j^"^^^ — k are in 0(n). We now want to show that an additional an number of steps is 
enough to couple two chains that are at distance 0{^/n) of their stationary mean. This will follow 
from a corresponding result for lazy simple random walk on I?, since close to the stationary mean 
we are now in the "entropy regime" where the drift of the chain is negligible so it behaves similar 
to lazy simple random walk. 

For a pair of lazy simple random walks on we can couple the two coordinates of the chains 

(r) (r')\ / (r) (r')\ 

"one by one" in a coupling of Ornstein type as follows: Let Vt = {V^ ) and Wt = (W^ , ) 

be two versions of lazy simple random walk on Z^, i.e. the transition probabilities are p{x, x) = 1/2 
and p{x,y) = 1/8 for each of the four neighbors y of x. Without loss of generality suppose that 

(r) (r)\ 

Phase I: (VJ 7^ ) We try to couple the r— coordinates by running an independence 

(r') (r')\ 

coupling for this coordinate while moving the r — coordinates in "lockstep", leaving — \ 

constant: Flip fair coin number one. If it comes up heads, we move in the r— coordinate: Flip 
another fair coin to decide which of the two chains to move according to one-dimensional (non-lazy) 
simple random walk for the r— coordinate. The other chain stays at its current location. If fair coin 
number one comes up tails, we move in the r'— coordinate: Flip another fair coin. If it comes up 
heads, both chains stay where they are; if it comes up tails, both chains move in the same direction 
on the r'— coordinate according to (non-lazy) simple random walk. Note that \V^^^ ~^t^^\ performs 

(r') (r')\ 

a one-dimensional lazy simple random walk, while \Vf — \ stays constant. Run in Phase I 
until V}''^ = W^''\ 

(r) (r)\ 

Phase II: {V; ' = W^' ') Now we switch the role of the two coordinates: We try to couple 
the r'— coordinates by running an independence coupling for this coordinate, while moving the 

(r) (r) 

r— coordinates in "lockstep", leaving — | = 0. Flip fair coin number one. If it comes 

up heads, we move in the r'— coordinate: Flip another fair coin to decide which of the two chains 
to move according to one-dimensional (non-lazy) simple random walk for the r'— coordinate. The 
other chain stays at its current location. If fair coin number one comes up tails, we move in the 
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r— coordinate: Flip another fair coin. If it comes up heads, both chains stay where they are; if it 
comes up tails, both chains move in the same direction on the r— coordinate according to (non-lazy) 

(r') (r')\ 

simple random walk. Note that ~'^t I performs a one-dimensional lazy simple random walk, 

(r) (r)\ (r') (r') 

while — VFj | stays constant equal to zero. Run in this phase till V^^ ' = and therefore 

Vt = Wt. □ 

So coupling a pair of two-dimensional (lazy) simple random walks can be achieved by coupling 
two pairs of one-dimensional (lazy) simple random walks, one after the other. Therefore we start 
with the following result for one-dimensional chains: 

Proposition 5.9. Let iVt) he {1 — 25) — lazy simple random walk on TL, started at Vq = k. That is, for 
5 G (0, ^] we have Vt = k + Y2l=i where the are iid with P{ii = ±1} = 5 , P{S,i = 0} = 1 — 25 . 
Let tq := min{t' > : V^' = 0} 6e the first time the walk hits zero. Then there exists a constant C 
such that for all k £ Z and r £ N we get 

P,{To>r}<^. (21) 
yr 

This is a straightforward generalization of the corresponding well known result for simple 
random walk [6 = i). For completeness, we give a proof below. We follow the proof of Corollary 
2.28 on page 36 in Levin et al. (2009) for the case where 6 = j. It is a consequence of the following 
result (see Theorem 2.26 on page 35 in Levin et al. (2009) for a proof). 

Theorem 5.10. Let be iid integer-valued random variables with mean zero and variance a"^ . 
Let Xt = Yll=i ^i' ^^^h ^0 = 0. Then 

4cr 

P{Xt ^ /or 1 < t < r} < — . 

Proof of Proposition \5.!k For A; = the claim is true, so by symmetry we may assume that 
A: > 1. Define Tq := min{t' > 1 : Vfi = 0} and r^. := min{t' > : Vfi = k} for A; G Z. Then by 
conditioning on the first step of the walk started at zero, we get from symmetry that 

PoW>r} = 6Pi{t+ > r - 1} + 5P.i{t+ > r - 1} 
= 26Pi{t+ > r-1}. 

Therefore 

Pi{Tk < Tq} Pk{TQ > r} = Pi{Tk < To and don't hit zero for r steps after hitting k} 

< Pi{T+>r-l} 
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-25 ^ 

Here, the first inequality follows since the first event is contained in the second. The second in- 
equality follows from Theorem 15.101 Since A; > 1, we get Pi{Tk < tq} = by Gamblers ruin, 
so 

Pk{ro>r} = PkW>r} 

2^/2k 
- -^=' ■ ° 

Next we show that by running the chain for an extra an steps (burn- in period) , we may assume 
that the number of ones in the state that is used in the two-dimensional projection is close to its 
stationary mean. Fix 6 > and let p := We will show that 

max ||P*+""(x,-) -vr|| < max | |P*(y, •) - 7r| | + o(l), (22) 

xe{0,l}" y:S{y)£n{p±5) 

where the o(l) term goes to zero as n goes to infinity. To see this, we condition on where we are 
after the first an steps: 

||P<+-(^,.)_7r|| = \\^P--ix,y)[P\y,-)-7r]\\ 

y 

< Yl ^""(x,2/)||P*(y,-)-vr||+ Yl ^""(x,y) ||P*(y,-)-^ll 

y:S{y)&n{p±5) y:S(y)fn{p±5) 

< max \\P\y,-)-TT\\+P,{S{X^n)^n{p±6)}. 

y:S{y)&n(p±5) 

The last term on the right hand side is in fact in o(l) because when writing San ■= S{Xan) we get 
for large a that 

Px{S{Xan) i n{p±5)} = Px{\San- ExSo,n + ExSan-np\>5n} 

< Px{\San- ExSo,n\>5n-\E^So,n-np\} 

< Px \ \San - ExSan\ >n( 5 - exp | - ^ ^ ^ a 



< 2 exp 

= 0(1)- 



2n^ {5-exp{-^a}y 
9 an 
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Here the second inequahty holds since we get from Proposition 13.21 that lE^Scm ~ iT'PI ^ ^17"" < 
nexp{— i^a}. The third inequality follows from our concentration result based on Azuma's in- 
equality (Proposition 14. 4p . This proves (j22p . So after an initial an steps we may assume that the 
number of ones is within 6n of its stationary mean np. 

Now we project down to our two-dimensional chain: Fix any y £ {0, 1}" such that k := S{y) G 
n{pi: 5). Writing the transition kernel in the parametrization ()19p we get from CoroUarv 15.21 that 

||P*(y,-)-vr|| = \\V{s^y)Zt)-7TzJ\ 

= \\P\{-k,k),-)-7:Zy\\ 

< max||P*(Kt;'),-)-vrzJ| (23) 

(v,v') 

< max \\P\{v,v'),-)-P\iw,w'),-)\\ 

(V ,v' ) ,(w ,w' ) 

The maxima here are over the entire state space of the two-dimensional chain (v,v'), {wjw') G 
{—k,...,k} X {k, ...,2n — k}. The second inequality above is well known, see for example Lemma 
4.11 in Levin et al. (2009, page 53). The last inequality is the coupling inequality, where r := 
> : Zj = Yj} is the coupling time in the coupling {Zj,Yj) that we now describe. 

Fix any {v, v'), {w, w') G { — k, k} x {k, 2n — k} and set Zq := {v, v') and Yq := {w, w'). 
Let t = s + u, where s := j^nlogn and u := an. For steps j = 1, 2, s we use an independence 
coupling, i.e. at each step we flip a fair coin to decide which chain to move according to the 
non-lazy version of its transition kernel. The other chain stays at its current location. Here, if 
P is the transition probability matrix (jlOp of the chain (Zj), then P' := 2P — I is its non-lazy 
version, where I is the identity. However, if Zj and Yj ever agree in the r (or r') coordinate, we 
modify the coupling so that they agree in that coordinate forever after. This is possible since the 
probability of moving up (or down) in the r-coordinate does not depend on the current location in 
the r'-coordinate. Similarly, the probability of moving up (or down) in the r'-coordinate does not 
depend on the current location in the r-coordinate. This is easily seen from the transition kernel 

(USD. 

(r) (r) 

We could implement this change as follows: Suppose Yj = -^j • Flip fair coin number one; if 
it comes up heads, try to move Yj according to its non-lazy transition rule. If that would result in 
Yj moving up (or down) in the r-coordinate, flip another fair coin. If it comes up heads, move Yj^^ 

(r) 

accordingly and move Z] ' in the same way; if it comes up tails, reject the move. If fair coin number 
one comes up tails, try to move Zj according to its non-lazy transition rule. If that would result 
in Zj moving up (or down) in the r-coordinate, flip another fair coin. If it comes up heads, move 

(r) (r) 

Zj accordingly and move Y> ' in the same way; if it comes up tails, reject the move. Similarly for 
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For steps j = s + 1, s + 2, s + n we couple {Zj,Yj) with an Ornstein-type coupling (Vj, Wj) 
of two lazy simple random walks on as described above, where now the laziness-factor (the 

ir) ._ 



probability of not moving) at step j depends on the current location of {Zj,Yj). Write Dj 
andZ)f) ^=zP-Y; 



(r) (r) (r') (r') (r') 

Z) — Yl- and D - := Z) — for the distance between Zj and Yj in the r and r coordinates 



(r) (r) (r) (r') (r') (r') 

at time j, respectively. Similarly we write C'-'' := V>' ' - W^- ' and C]' ' := V>' ' - W'-' '. To start, 
let Vs:=Z, and Ws:=Y,,so that D^s^ = C^'^ and o'f'^ = c'f'\ 

Note that the chains {Zj), (Yj) have a (small) drift towards their stationary mean. A short 
calculation based on ()19p and Corollary 15.61 shows that this drift is linear in the distance to the 
stationary mean. Since iVj), iWj) don't have a drift, this coupling can be done in such a way that 
at all times and in both coordinates, the distance between Zj and Yj is no greater than the distance 
between Vj and Wj. Throughout, once the two chains meet (in the r or r' coordinate), we let them 
stay together in that coordinate forever, i.e. if Y^^^ = Z^ then Yj^^^ = Z^ for all i > j. Similarly 
for the r'-coordinate. This can be implemented as follows: 

Phase I: (V^- ^ Wj ) We first try to couple the r-coordinates of Vj and Wj by running 
an independence coupling for this coordinate while moving the r'-coordinates in "lockstep" , leaving 
\CP\ = ivj'''^ - WP\ constant. To this end, let 

P{y/"Voves I Yj} + Voves \Zj} kl + 9 y}''^ + zf ^ i _ e 

^-=^ 2 = n^-^^^^ 

be the average of the probabilities that Yj respectively Zj moves in the r-coordinate. Note that pj 

(r) (r) 

is random and depends on the current location of Yj and Zj . Throw a Ber(2/9j) coin. If it comes 
up heads, we move in the r-coordinate: 

. If / z^, use one Uniform [0, 1] random variable to pick one of the four possible moves 
Yj^^ up, Yj^^ down, zj*"^ up, zj'^'' down; let the same Uniform[0, 1] variable determine one of 
the four possible moves vj'^'^ up, vj"^^ down, wj^^ up, Wj"^^ down. This can be done in such a 
way that l^j^il ^ I'^j+il- 

• If Yj^^ = Zp^ we have pj = moves 1 1^} = P{zj''Woves | Zj} and we can move Yj^^ 

(r) 

and Zj either both up, or both down, or both stay. At the same time, independently pick 
one of the four possible moves Vj^^ up, vj"^^ down, Wj'^^ up, Wj"^^ down with equal probability. 



If our Ber(2/9j) coin comes up tails, we move in the r -coordinate (or we don't move). This can be 

(r') (r') (r') (r') 

done in such a way that |C'j+il = |C] I and < \Dj \. Overall, this coupling in Phase I 

ensures that \Dj\ < \Cj\ for all j = 1,2, ... and in both coordinates r and r'. Furthermore, (|C-'^^|) 
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performs a lazy simple random walk where the probability of not moving is 1 — 2pj; this probability 

(r) (r)\ 

is a random variable that depends on the path of (Yj , '). 

Phase II: {vj'^^ = Wj'^^) We have the r-coordinates matched, Dj^^ = Cj^^ = 0, and want to 
keep them that way. At the same time, we try to couple the r'coordinates. To this end, let 

P{Yj'''^ moves \Yj} + P{zP moves \Zj} kl + O Z^^'^+yPi-Q , , 

C7i := = 1 (25) 

^ 2 2 n 4 2n 4 ^ ^ 

be the average of the probabilities that Yj respectively Zj moves in the r'-coordinate. Note that 
cjj is random and depends on the current location of Y^'' ^ and zj' \ Throw a Ber(2crj) coin. If it 
comes up heads, we move in the r'-coordinate: 

use one Uniform[0, 1] random variable to pick one of the four possible moves 

(r') (r') (r') (r') 

Yj up, Yj down, up, Zj down; let the same Uniform[0, 1] variable determine one 

(r') (r') (r') (r') 

of the four possible moves Vj up, Vj down, Wj up, Wj down. This can be done in 

(r') (r') 

such a way that \D)'^[\ < iqvii. 

• If Yj'^ ^ = zj^ ^ we have aj = P{Yj^ Woves} = P{Zj^ Woves} and we can move Yj^ ^ and 

(r') 

Zj either both up, or both down, or both stay. At the same time, independently pick one 

(r') (r') (r') (r') 

of the four possible moves up, Vj down, Wj up, Wj down with equal probability. 

If our Ber(2crj) coin comes up tails, we move in the r-coordinate (or we don't move). This can 
be done in such a way that |cj^\| = = and l-Dj'^^^l = iDj^^l = 0. Overall, this coupling 

(r')\ (r')\ \ (r) I (r) i 

in Phase II ensures that \D\ '\ < |C] 'I and \D\ '\ = \Cy\ = for all j = 1,2,.... Furthermore, 

(r') 

{\q' '\) perfor ms a lazy simple random walk, where the probability of not moving is the random 
variable 1 — 2a j that depends on the path of {Yj^ \ Z^- ■*). 

By ([22]) and ([23l) . we may assume that ^ G ^ ^'^y ^ > 0- As a consequence, the 

(random) laziness factors for the simple random walks (|cj'''*|) and (icj*^ ^|) are bounded away from 
one, since in that case 

^1 moves I F,-, Z,} = 2p,- > -Q > f — ^ - 6] 9 =: 25r > 0, 
■' n \l + 9 J 

moves | Yj,Zj} = 2aj > ^^^^ > (jTo ~ ^ > °' 

where we picked some 6 G (0, j^) to ensure the two strict inequalities. This follows from the 
definitions of pj and aj by noting that the r-coordinates lie in [— fe, k] while the r'-coordinates lie in 
[k,2n - k]. 



29 



Let f := min{j > : Vj = Wj} be the coupUng time of {Vj,Wj). Denote with ir and v the 
time we spend in Phase I and Phase II of the coupUng described above, so that t = ir + t^i . That 
means -fr (respectively fr') is the time it takes for the r (respectively r') coordinates of Vj and Wj 
to couple, i.e. the time it takes for the lazy simple random walk (icj*^^!) (respectively(|cj^ ^|)) to 
hit zero. (Again, the laziness factors here are random, depending on the path of {Zj, Yj).) 

Now we compare these two lazy one-dimensional simple random walks to two other one- 
dimensional simple random walks that are uniformly lazier, but with deterministic laziness factor. 
For the walk corresponding to the r-chain the probability of an upward move will be 5r- For the 
walk corresponding to the r'-chain the probability of an upward move will be 5j.i ■ Let (respec- 
tively fr') be the first time this deterministically lazy chain hits zero. By letting it follow the path 
of (icj*"^!) (respectively {\C^j' ^|)) at a slower pace, we can ensure that < f,. and Tr' < f,.' almost 
surely. The starting point of the walks will be indicated as a subscript in the probability measure. 

Now we put things together. By conditioning on where we are after the first s steps (the drift 
regime of our coupling), we get 

Piv,v'),{w,w'){T > s + u\Zs,Ys} = 1{t > s}Pz,Xs{'r > u} 

< Pz.,Ys{r>u} 

< Pz.,Ys{rr > u/2} + Pz.,y.{v > u/2} 

< Pj^ir){fr > u/2} + P^(r'){fr' > u/2} 

^ C\D^^C\D^ 

~ \/6rU \/6r'U 



Here the first inequality holds since Vj = Wj implies Zj = Yj by the construction of our 
coupling. The last inequality follows from Proposition 15.91 where C is a positive constant. By 
taking expectation, we get for large n that 



Ml 



P{v,v')Xw,w'){'r > S + U} < 



CE{v,v'),{w,w')\P'^ 



CE{v,v'),{w,w')\P's . ^ 

\/6rU ^J6r'U 



+ 



\/6r'U 



C\v — w\^^ C\v' — w'\Y 



/6rU 



+ 



\J5riU 



^ C'n/^ ^ C'n/^ 



'an 



'an 



< 



C 



a 
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Here E(^^y^^(^^^^:-^\Di^^\ = \E(^^y^^(^^^^i-^Di^^\ in the first equahty fohows since Dq^^ > imphes 
Dj^^ > for all j < s and D^^ < implies Dj^^ < for all j < s, by the construction of our 
coupling. Similarly for the r'-coordinate. (In both coordinates r and r', the paths of Zj and Vj 
never cross.) The second equality above uses Remark 15.81 In the second inequality we use the fact 
that 7** ~ 1/ ^/n, as shown in Remark 13.31 

Combining this with ()22p . (j23p . it follows for t = s + u as above that for some constant C we 

get 

lim limsup max •) — 7r|| < lim limsup max Piw') (w w'){t > ^ ~^ 

a^co „_j.oo x€{0,l}" n-s>oo {v,v'),{w,w') ^ ' ^'^ ' 

c 

< hm —= 
= 0. 

This finishes the proof of the upper bound part of Theorem 12.11 and we are done. □ 
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